Skip to content

feat: safetensors support — BF16 streaming indexer#60

Merged
AdaWorldAPI merged 3 commits into
masterfrom
claude/safetensors-support
Mar 30, 2026
Merged

feat: safetensors support — BF16 streaming indexer#60
AdaWorldAPI merged 3 commits into
masterfrom
claude/safetensors-support

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

What

Adds safetensors format support to the streaming indexer. Same pattern as GGUF: parse header → iterate tensors → project rows → write bgz7.

Files

  • safetensors.rs (new, 414 lines)

    • read_safetensors_header() → parses the JSON header, produces GgufFile-compatible types
    • stream_index_safetensors_bf16() → thin wrapper: parse header → delegate to stream_index_gguf_bf16_with_header
    • Minimal JSON parser (no serde dependency): extracts dtype, shape, data_offsets
    • Tests: synthetic safetensors roundtrip, dtype parsing, JSON extraction
    • Integration test: test_stream_index_qwen35_safetensors (11 shards, ~55 GB)
  • gguf_indexer.rs (refactored)

    • Extracted stream_index_gguf_bf16_with_header() — the core loop, format-agnostic
    • stream_index_gguf_bf16() now just parses the GGUF header and delegates
    • No behavior change for existing callers
  • mod.rs — registered safetensors module

Why safetensors for the reasoning diff

The Qwen3.5 base and distilled models are available as BOTH:

  • GGUF Q8_0 (28.59 GB, 8-bit quantized)
  • Safetensors (55 GB, full BF16 precision)

Indexing at BF16 gives cleaner Base17 fingerprints — Q8_0 introduces quantization noise before the golden-step projection. For causal diffing, less noise = sharper NARS truth values = more reliable reasoning scaffold detection.

Architecture

GGUF:         read_gguf_header()         ─┐
                                          ├→ stream_index_gguf_bf16_with_header()
Safetensors:  read_safetensors_header()  ─┘   (shared: chunked BF16, F64x8 SIMD,
                                               halftone, tail deletion)

One core pipeline, two header parsers. Syntax-checked with rustc 1.94.1.

Parses the safetensors JSON header (no serde dependency) and produces
GgufFile-compatible types so stream_index_gguf_bf16_with_header works
unchanged on safetensors files.

Safetensors stores full BF16 weights — no quantization noise.
For the reasoning diff pipeline, BF16→Base17 gives cleaner
fingerprints than Q8_0→f32→Base17.

Includes test_stream_index_qwen35_safetensors for 11-shard
Qwen3.5-27B indexing at full BF16 precision.
…tic indexing

Splits stream_index_gguf_bf16 into:
- stream_index_gguf_bf16(): parses GGUF header, delegates to _with_header
- stream_index_gguf_bf16_with_header(): the core loop, works with any
  pre-parsed header (GGUF or safetensors)

No behavior change for existing callers.
@AdaWorldAPI AdaWorldAPI merged commit 43cfad0 into master Mar 30, 2026
4 of 10 checks passed
@AdaWorldAPI AdaWorldAPI deleted the claude/safetensors-support branch March 30, 2026 09:43
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ab54615e1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/hpc/safetensors.rs
Comment on lines +158 to +162
let shape = extract_json_array_u64(obj_str, "shape").unwrap_or_default();

// Extract data_offsets
let offsets = extract_json_array_u64(obj_str, "data_offsets").unwrap_or_default();
let offset = if offsets.len() >= 1 { offsets[0] } else { 0 };
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail fast when required tensor fields are missing

This code silently substitutes defaults ([] for shape, 0 for data_offsets) when parsing fails, then still emits a TensorInfo. On a truncated/corrupted or slightly non-conforming safetensors header, that can make multiple tensors read from the wrong byte range (often offset 0) and produce a seemingly successful but corrupted index instead of returning an error. Required fields for each tensor should be validated and parsing should fail if they are absent or invalid.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant